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Abstract 

In this note we introduce an estimate for the marginal likelihood associated to hidden Markov models (HMMs) 
using sequential Monte Carlo (SMC) approximations of the generalized two-filter smoothing decomposition 
[3]. This estimate is shown to be unbiased and a central limit theorem (CLT) is established. This latter 
CLT also allows one to prove a CLT associated to estimates of expectations w.r.t. a marginal of the joint 
smoothing distribution; these form some of the first theoretical results associated to the SMC approximation of 
the generalized two-filter smoothing decomposition. The new estimate and its application is investigated from a 
numerical perspective. 
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1 Introduction 

Hidden Markov models provide a flexible description of a wide variety of real-life phenomena; see [I]. An HMM 
is a pair of discrete-time stochastic processes, {X n } n>0 and {Y„} n>1 , where X n g Mr" is an unobserved process 
and y n £ M. d « is observed. The hidden process {X n } >Q is a Markov chain with initial density S xo at time and 
transition density fg with 6 e 6 C R do i.e. V g (X a € A) = S Xa {A) and V g (X n e A\X n -i = x n -i) = 

Ja fd{x n \x n -i)dx n n > 1 where Pg denotes probability, A C M dx , S Xo is the Dirac measure with mass at xq, and dx n 
an assumed dominating measure. In addition, the observations {Y n } n>1 conditioned upon {X n } n>0 are statistically 
independent and have marginal density g e (y n \x n ), i.e.F g (Y n <E B\{X k } k > = {x k } k >i) = J B ge(y n \xn)dy n n > 1 
with B C M, dy and dy n the dominating measure. The HMM described above is often referred to in the literature as 
a state-space model. Here 9 is a static parameter, which is fixed throughout and we shall only be concerned with 
scenario that one observes a batch data set yx : T := {y-y, ■ ■ . , yr)- The joint density of the observations pe{yi:T) is 
termed the marginal likelihood. For most models of practical interest, this quantity cannot be evaluated exactly. 
A popular collection of approximation techniques for HMMs, which can estimate the marginal likelihood are SMC 
methods. 

SMC techniques simulate a collection of N samples in parallel, sequentially in time and combine importance 
sampling and resampling to approximate a sequence of probability distributions of increasing state-space known 
up-to an additve constant; see [9] for an introduction. These techniques provide a natural estimate of the marginal 
likelihood of HMMs (as well as for normalizing constants of Feynman-Kac representations; see [5]). The estimate 
is quite well understood and is known to be unbiased [5] and the relative variance is known to increase linearly 
with T |5J[T2|. However, the standard SMC estimate is not the only alternative one can consider. A relatively 
recent procedure designed for smoothing, is based upon the generalized two-filter decomposition (see e.g. [2] for the 
two-filter smoothing decomposition). Roughly, the idea is to run two independent SMC algorithms, one forwards 
(as before) and one backwards (which approximates a collection of appropriately defined target distributions) and 
for them to 'meet' at some point. Using this procedure, one can yield more efficient schemes for smoothing, relative 
to standard SMC procedures. In the following note we: 

1. Introduce a new estimate, costing O(N). of the marginal likelihood using the generalized two-filter smoothing 
decomposition . 

2. Establish that this estimate is unbiased and prove a CLT, under some assumptions. 

3. Numerically investigate the estimate. 

It is remarked that via 2. we can also establish a CLT for an estimate of expectations w.r.t. a marginal of the joint 
smoothing distribution. 

This note is in two halves; the first focuses on the idea from a methodological perspective. The second is the 
proof of our results in point 2. The note is structured as follows: in Section [2] we discuss the estimate and our main 



result. In Section [3] some simulations investigating the new estimate are given; in particular, some comparisons to 
the forward filtering backward simulation (FFBSi) algorithm in [5]. The proofs of our results are housed in the 
appendix. 



2 SMC and Generalized Two-Filter Smoothing 
2.1 SMC Algorithm 

We consider the joint smoothing distribution, with 9 fixed: 

/ I \ Ul=i9e(yn\x n )fe(xn\xn~i) m 

mXl:T\yi:T) = T (1) 

the denominator is denoted Pe{yi-T)\ this is the marginal likelihood. We remark that throughout, the transition 
and observation densities can be time-inhomogeneous, but we omit this from our notation. One can construct an 
SMC algorithm to sample sequentially from ~kq(x\ \yi), ■ ■ ■ , ^e{xi-.T\yi:T)- The idea is to use a collection of particles, 
simulated in parallel, which are written (^i in )te{i 1 ...,jv} to denote samples forward in time, the reason for the 
notation will become apparent below. We will sometimes denote the index of a particle at time n by a l n , and we 

adopt the notation "a? n " = ~x*n ■ 

• Step 1: For i € {1, . . . , N} sample ~X\ ~ Qi,e(') an d compute the un-normalized weight: 



7± 



5e(2/i|"^i)/e(^iko) 



qi, 



For i £ {1, . . . , N} sample a\ € {1, . . . , N} from a discrete distribution on {1, . . . , N} with jth probability 
~v${ = W{/^2^ =1 W[ these represent the resampled particles. Set n = 2. 

• Step 2: If n = T + 1 stop. Otherwise, for i € {1, . . . , N} sample XJJ~a^!\ ~ 9n,fl("l"^n-i) anc ^ com Pute the 
un-normalized weight: 

^ = 9e{Vn[^n)fe{~^ l n[^ a nJl) 
qn.fi{~^n\~^n-l) 

For i £ {1, . . . , TV} sample ~ct l n £ {1, . . . , N} from a discrete distribution on {1, . . . , N} with jth probability 
= VF^/ YlfLi " «• Set n = n + 1 and return to the start of step 2. 

The estimate of the marginal likelihood is: 

pSw=ii ( v >:»■:- 1- 



T / N \ 

n ^E*:. 

n=l \ ;=i / 



2.2 Generalized Two-Filter Smoothing 

It is well-known that the above SMC algorithm does not approximate the joint smoothing distribution at all well. 
One technique which is known to assist the simulation procedure (at least empirically) is the SMC approximation 
of the generalized two-filter representation [3]. The algorithm works by defining two filters. One works as the SMC 
algorithm above and moves forward in time. The other works backward in time, on a sequence of densities defined 
below. These two algorithms 'meet' at some pre-specified time t £ {1, . . . ,T}. 
Define the following sequence of densities (we use the convention Yla = 1) : 



n | ) 



Y\_ 9e{y n \x n )fe{x n \x n -x) 



n=t+l 



£{t,...,T} 



where, at this stage, £ nj g are a sequence of (essentially) arbitrary density functions w.r.t. dx n . In practice, the ^ n ^g 
are critical to the efficiency of the algorithm and we return to this point in Section [3] We write the normalizing 
constant as pe{y n -.T)- One can use SMC to approximate this sequence of densities. We will sometimes denote the 

index of a particle at time n by ^at, and we adopt the notation ten " = tzT^ 1 -'. 
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• Step 1: For i 6 {1, . . . , N} sample *X l T ~ Qt,b{') an d compute the un-normalized weight: 

For is {1, . . . , N} sample laT^ G {1, . . . , N} from a discrete distribution on {1, . . . , TV} with jth probability 
tu 3 T = I *l2d=i these represent the resampled particles. Set n = T — 1. 

• Step 2: If n = t — 1 stop. Otherwise For i G {1, . . . , N} sample A^jlF^^ ~ 9ri,e('|^^+i) an d compute the 
un-normalized weight: 

£n+l,0 (^+1 )9n,8 I ^+1 ) 

For i g {1, . . . , N} sample *a z n £ {1, . . . , AT} from a discrete distribution on {1, . . . , N} with jt/i probability 
ti7^ = t^/^Z^Li " n* Set n = n — l and return to the start of step 2. 

One can estimate the normalizing constant Pe(yt+i:T) by using a similar expression to (J2j) ; this estimate is 
denoted p$ (y t+1:T ). 

2.3 Two Estimates of the Marginal Likelihood 

The objective here is to consider how one can use generalized two-filter smoothing to estimate the marginal likeli- 
hood. One can consider [21 Proposition 3] which states that 

pe{yi:T = / ^e{xt-i,yi:t-i)^e{xt,yt:T)—z — -, — r—ax t -\-.t- 
J it,e( x t) 

After some standard calculations, one has 

Peivur) = pe{yi:t-2)pe(yt+i-.T) / qt-i,e(xt-i\xt-2)qt,e(xt\xt+i)Wt-i{x t -2:t-i)W t (xf.t+i) x 



/ j \ — / I \ f(Xt\Xt-l) , 

whence an SMC estimate, one filter run up-to time t — 1 forward and the other run backward to time t with no 
resampling at the final time step only, of the marginal likelihood is 



itfW) = pZ(yut- 2 )pZ(yt + uT)± E E ^-^U-. t -iPt(H, + y { * 'ItT, 



This estimate is perhaps slightly undesirable as it has a computational cost of 0(N 2 ). 
An alternative approach is to use the slightly modified representation 

PeKVi-T) = Pe{yi:t-i)Pe{yt+i:T) / ^(at-i yiit-iKH^t+i yt+i:Tj ^ ? ffeCj/t RJ^t-irf+i- 

Again, after some simple manipulations, one arrives at the formula 

Pe(yi:T) = P$(yi:t-2)pe(yt+2:T) / ne(xt-2\yi:t-2)no(x t+ 2\yt+2:T)W t -i(x t -2:t-l)Wt+l(xt+l:t+2) X 



gt-i.eCxt-ilzt-aJgt+i.e^t+iFt) ? ? % ge{yt\xt)dx t -2:t+2- 

kt+i,e{x t+ i) 

Now, if one runs the two forward and backward SMC algorithms up-to times t — 1 and t + 1 respectively, not 
resampling at the very final time steps, one has the approximation: 

N N 

, ge(yt\xt)dx t . 
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This quantity can be approximated using the following procedure in 1 10] . Consider a conditional density q t ,g(xt\xt-i, Xt+i) 
and two probabilities J\_ v , i,j £ {1, . . . , N} %_ x = 1, £f =1 ty+i = L Sam P le i(l),i(l) s • • • , i(N),j(N) 

using the /3j_j_ 1 and then, for each pair sample X| |a;jf!\ , a^S from the distribution induced by 

Qt,e('\%t-v !E f+i)j wn i cn leads to the estimate, which only costs 0(N): 



(Vl:T) 



1 W 1 



t+2J 



6 +M (^ +1 )?^^'S? t , e (xii^ 1 ,4 i ) 



-3e(y*k*)- 



(3) 



2.4 Unbiasedness and Central Limit Theorem 

We will give some analysis of the estimate (J3j> ; we denote this estimate p^ (uut)- We make an assumption (Al) 
which is detailed in the appendix. In addition, the notations for the expression of the asymptotic variance are also 
defined in the appendix. For a function p : R d — > K such that sup,j, gR d |<£?(x)| < +oo, we write <p £ Bb(M. d ). J\fd([i, S) 
denotes a d— dimensional normal distribution with mean /i and covariance S; if d = 1 the subscript d is omitted. 



Theorem 2.1. We have 



Ebfl(yi:r)]=P8(i/i:r) wee. 



In addition, assume (Al). Then for fixed T > 2, t £ {3, . . . , T — 2} and any 9 £ O we have that 

^N{Pe [Vi.t) -Po(Vi-.t)) => Z e 

where Zg ~ Af(0, erf T (0)) with 

4t(0) = 4Mj^-iWfo./(v)])+4 w ,.fctH«A-^/(v))) 



t-i 



0% (cp) 

7 t-1,8 



L J. / 2 



9=1 

T-t-l 



g=0 



<9 T-g,t+l (V) - Vt- 9 ,9<5t 



(*>) 



Remark 2.1. Under some additional mixing conditions, one may establish that the asymptotic variance a 2 T (9) 
when divided by pgiyi-.r) 2 the asymptotic variance associated to a normalized estimate) obeys the following 

inequality: a 2 T (6) / 'peiyi-.T) 2 < C\[ff){t — 1) + C2(0)(T — t) where the first term is the error from the forward filter 
and the second from the backward filter. Unfortunately, this provides little intuition on how to select t and it simply 
implies that if the forward algorithm works better, one should choose t large and vice versa. 

Remark 2.2. Let p : R da: — > R, p £ Bb(R d *), and consider Eg [<p(X t )\yi : T] where 3 < t < T — 2 and the expectation 
is w.r.t. the joint smoothing distribution, with density Q. Using the ideas in Section 2.3 one can show that an 
estimator of Eg[p{X t )\yi.x\ is 



(y 



1:T) 



1 N 

( yi .,_ 2 )p$(y t+2:T )— £ ^ t -l(^; ( iL-l)^ t+ l(H'l:t+2) 



1=1 



6 +1 , e (^ +1 )?^ 1 ^S 9t , e (^|^ 1 ,^l) 



Denote the estimate as pff t (<p) / Pf? {Ui-.t) o,nd set Eg[p(X t )\yi-T] — Pe,t{^P)/pe{Vl:T)- Standard calculations reveal 
(e.g. [6, pp. 301]) that 



Pe,t(f) Pe,t(<p) _ Peivv.T) N 



(Hut) Pb(vi:t) Pgivi-.T) ' \Pe{yi:T) L Peivi-.r) 



P8,t{<p) 



Now, upon inspection of the proofs in the appendix, one can easily deduce: 
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• Pe(yi:T)/p$ (ui-.t) will converge in probability to 1. 

• Let tp = l/p e (y 1:T )[ip - pgj(ip)/p e (y 1:T )}, then 

nr T n ( 1 r Pe.t(<P) ] \ ^ 7 /~\ 
vNp gt [— -ip Z e {tp) 

\Pe\yv.T) L Pe\y\:T)iJ 

where Zg(lp) ~ A/"(0, of T (£>)), 

< T (0) = 4M,,(^iW^Hi J ./?(v)])H w fc^v(^<-iWv))) 

and for (x t -i,x t +i) € R 2dx 

7 ff/ ^(x t _i,i t+ i) = / ge(?/t|x t )^(a;t)/e(it+i|a;t)/e(a;t|it-i)rfa;t- 

See the appendix for further definitions of the notations. 

Hence, on using Slutsky's Lemma, one has a univariate CLT for an approximation ofEg[ip(X t )\yi.x]- One can follow 
the ideas of pp. 301-302] to prove a multivariate CLT. It may be possible to compare this estimate (through the 
asymptotic variance) relative to the one produced by the forward filtering backward smoothing algorithm; see £/J 

Remark 2.3. The unbiased property allows one to use the SMC approximation of the generalized two-filter repre- 
sentation within a particle Markov chain Monte Carlo ^ algorithm. In \1V$ . we have established an appropriate 
target distribution in this context. 

3 Numerical Examples 

3.1 Measuring the New Estimate's Sensitivity to t 

Consider the linear Gaussian model provided in Section 4 of [10J: Xq ~ A/2 (/io, £0), X n+ \ \ (X\- n — x\- n , Y\ :n — yi- n ) ~ 
N 2 (Fx n ,Q), Y n I (Xi:„ = xi :n ,Y Un _i = yi:„-i) ~Af(Gx n ,R), with 

G=(1,0) R = t 2 

Mi Q ^{\ \) 

We ran the two-filter SMC algorithm to calculate the marginal likelihood via ^ for the instance where T = 300. 
Our objective is to observe how the choice of t affects the accuracy and precision of the new estimate. A Kalman 
filter is used to allow us to choose £, n ,s{x n ) = 7Tg{x n \yi :n -i) (the predictor); this corresponds to an extremely 
favourable choice (indeed one recovers the FFBS procedure, when considering the smoother). In all simulations, we 
used the optimal importance distributions and j3 resampling weights as in Appendix A of [10]. We set N = 300. 
We ran nine versions of the two-filter algorithm, with t € {T/10, 2T/10, . . . , 9T/10}. In each case, we plotted the 
variability of the estimate and compared (J3J) to the maximum likelihood estimate provided by the Kalman filter. We 
ran many simulations for different pairs of values of the state noise, v 2 , and the observation noise, r 2 ; specifically, 
we looked at 225 possible pairings where v 2 and r 2 each ranged from 1 to 98. The results are displayed in Figure 

We found the same phenomenon across all pairs of values of v 2 and t 2 . There is an increase in the variance 
of ([3]) as t approaches T (i.e., when the new estimate relies more on the forward filter and less on the backward 
filter). Furthermore, we see the accuracy of the estimate fall as t approaches T. These results are in accordance 
with the degeneracy measures of the two filters. The forward filter's effective sample size (ESS) stays around 200, 
while the backward filter's ESS never drops below N — 300 (due to the choice of £ nj g and the various proposals 
adopted). The choice for # ensures the backward filter's consistently strong performance. This suggests that 
a better algorithm may result from removing any dependence on the forward filter and running a backward filter 
(with £ n ,o(%n) — ft0(%n\yi:n-i)) from time T to 1; this point is discussed in Section [4] Note that, in comparison to 
the standard SMC estimate, the results for the new estimate (for this model and the current settings) were superior 
w.r.t. the variability of the estimate (results not shown). 
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3.2 Comparing the Two-filter Decomposition to FFBSi 



Remark 2.2 above parallels a similar result shown in [S] for another O(N) SMC smoothing approximation based on 
FFBS. To explore this point further, we used the same example from |10| to compare the two-filter SMC algorithm 
to the FFBSi algorithm in jS] . We used both algorithms to calculate the expected value of the state of the hidden 
process given yi : r=300 a ^ time t £ {T/10, . . . , 9T/10}. Both algorithms utilized N = 300 particles. Note that 
FFBSi relies on rejection sampling, and so due to its stochastic running time, it is difficult to exactly match the 
computation times. Again, 50 simulations per algorithm per {t 2 ,v 2 ) pair for 225 pairs are run; see Figure 



3.2 



We 

found that the two algorithms gave very similar results, although the two-filter decomposition yielded estimates of 
lower variance (see Figure 3.2). This is especially true at lower values of t, where, as above, the backward filter has 
more influence on the two-filter estimate. 




0.4T 0.5T 0.6T 
Meeting time, t 



0.4T 0.5T 0.6T 
Meeting time, t 




Figure 1: We present the output for some pairs of v 2 and r 2 . The circles, whose scale is on the left, give the 50 
simulated values of the logarithm of the marginal likelihood per time point. The solid line, whose scale is on the 
right, measures the ratio of the variance of these 50 values at each time point to the variance at the previous time 
point. The dotted line gives the logarithm of the marginal likelihood as provided by the Kalman filter. 



4 Discussion 

In this note, we introduced a new 0{N) estimate of the marginal likelihood using the generalized two-filter decom- 
position. We established that this estimate is unbiased and proved a CLT, under some assumptions. Numerical 
examples suggested that the new estimate is sensitive to changes in the meeting point of the forward and back- 
ward filters. When choosing S, n ,e( x n) = ^e{x n \yi:T)j the backward filter significantly outperforms the forward filter 
and it may be beneficial to remove the forward filter from the estimation procedure. However, one can seldom 
make this choice for the £ ra ,g, and so we would like to approximate them. In joint work with Prof. A. Doucet, we 
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Figure 2: We present the output for some pairs of v 2 and r 2 . At each time point, the black dots (left) represent 
50 simulated expected values from the two-filter algorithm and the blue dots (right) represent 50 estimates from 
FFBSi. The first component of E[X(|j/i : t] is on top, and the second component of E[Xt|yi : r] is on the bottom. 



are exploring a smoothing algorithm where one introduces a discrete valued auxiliary variable J G {1,...,N} and 
considers the sequence of extended backward targets (where we condition upon the particles from a forward SMC 



Y[ n=t +i 9e{yn\x n )f e{x n \xr 



tig {t, ...,T}. The idea 



algorithm) ir g (j, Xn-rlyi-.r) oc fg ge(yu\x n ) 
is to approximate the £ nj g that are used above, via the forward filter. 
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A Proof of the CLT 

Here we describe a Feynman-Kac representation, which is used in the proof of the CLT. Let t G {3, . . . ,T — 2}, with 
T > 2 also fixed. 

Define, the forward Feynman-Kac un-normalized n— time marginal, n G {1, 1}: 



Tl — X 

J\ Wp(x p )M p (x p - 1 ,dx p ) 



P =i 



M„(x„_ !,£&„) 
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with x p — (x' p ,x p ) € M 2dx , Mi(xo, dx ) = S Xo (dx' 1 )qi i g(xi\x' 1 )dxi and 

M p (x p _ 1 ,dx p ) = 5 ip _ 1 {dx' p )qp fi {xp\x' p )dxp. 

The normalized operator is 

We also define the forward semi-group operator: 



/lis -L 
J| ^V g (a;g)M ?+ i(a;g,rfa;g + i) 



with 1 < p < n <t — 1. The selection mutation operator: 

*.<*.-,,)(•) - ,e(o,..., ( -l} 

with the conventions ~$i(~rfo,o) = ~^i,e- 

Define, the backward Feynman-Kac un-normalized n— time marginal, n € {0, . . . , T — t — 1}: 



/-T-n-1 
n ^ 
- P (xT- P )M T -p(xT-p+i,dxT-p) M n (x n+ i,dx n ) 
- p=0 



with x„ = (a4,ar n ) € R 2dx , Mxidxr) = qT(xT)dxTS x (dx' T ), x e M dx an arbitrary point 

M n (x n+1 ,dx n ) = q n .6{x n \x' n )dx n 8 Xn+1 (dx' n ) n e {i + 1, . . . , T - 1}. 
The normalized operator \r-n,e = ^T-n,e{dx n )/ < ^' r-n,e(l)- Also define the semi-group operator 

^ p— n— 1 



1 [ ^^p— s s)-^^p— s— 1 (*£p— s : dXp— s _ x ) 

s=0 



with T>p>n>i+1. Also 



tb rtr \n Vr- g +i,fl(^r- g +iM r - 9 (-)) , , 
*t-,(>|t-,+i,d)(') = z: ^= : ge{0,...,T-t-l} 

»?T-g+l,e(VVT-g+l) 



and $r( 77 t+i) =Vt- 
We will use the notation 



^/(it-i,^t+i) = / 5e(yt|a; t )/ e (x t+ i|x t )/ e (a; t |x t _i)dx t 
t^ t+ i(a; t+ i) 



with 



^itfvi+rft-iAvtt-iIgf {-,.))) = J ^+i(dx t+1 )^ t+1 (x t+1 )[J ^ t -i,e(dx t -i)\^ t -i(xt-i)Igf(xt-i,xt+i)] 



for a— finite measures Ht-\,^t+i- 

Using the above notations, we can write 



N 7& 



1=1 ^ t.t+i,e\ x t+1 ) p t _ 1 p t+1 qt.fi{x t \ x t _ x , x t+1 ) 
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with 

p— 1 i= 1 
T—t—2 N 

vf +2 (D = n ^e^-pc^t-,). 

p=0 i=l 

To prove the central limit theorem (CLT), we make use of the following assumption, which is similar to (H) m 
(m = 2) of |S]. It is used to control remainder terms, when constructing a CLT. It implies that the backward 
Markov proposal kernels mix very quickly. 

(Al) 1. The incremental weights all satisfy: 

W (r) W (r) 

VI < n < t - 1 S g = sup ' < oo Vt + 1 < n < T 5 e = sup > " V ; < oo 
x >v W n (y) x,v W n (y) 

For each 9 £ there exist < C_ g < Cg < oo such that for every x, x' £ M dx , n € {1, . . . , T}, y n € K d " 

C e < f e (x'\x) <C e C e < t, nfi {x) <C e C e < g s {y n \x) < Cg. 

In addition, for each £ <d, there exist < C_ g < Cg < oo as above, such that for each Xt, Xt-i, x t £ K dx , 
i£{l,...,N} 

C e <q t ,e{x t \x t ^x t+ x)<Cg Cjg<%_ x <Ce C e <^\ +1 <Cg. 

2. For m = 2 and some sequence of numbers uj p m ^ £ [1, oo) such that for each p £ { — 1, . . . , T — t — to} and 

any (x,x') £ R 2dx we have 

M T _ PiT _ p _ OT (a;,dj/) < Lo p m) M T - p ,T-p-m(x' , dy) 
where M Piq — M p —\ . . . M q , p > q. 
Proof of Theorem \2.1\ We have that: 



^ and £ t+ 

and t + 1 respectively. We can use the decomposition of [6_ to obtain the following formula 



where and are the nitrations generated by the forward and backward particle systems up-to time t — 1 



where 



^{yi-.T)0ti ® ^t+i] - Pe(j/i:T) = a(iV) + £(iV) + fl(JV) 



I— i. 

«W = E^W^-^^-^K^t-iI^-iWiCHi^/Cv))]) 

9=1 

r-t-i 

/3(iV) = £ V?-,(1)[V?-, - $r-,(V?-,-i)](^T- 5!i [4i^-i(^-i^/('. ■))]) 

9=0 
t-1 

fi(^v) = E TWtf ? - ^ 9 (^-i)](4,*-i[^i[V f + i - v. + i,»]fei^(' 1 •))])• 



9=1 

It is straightforward to verify that the expectation of this quantity is exactly zero, which establishes the unbiased 
property. 

By using the Marcinicwiez-Zygmund inequality 

E[\VN^( yuT ) E\p^(y 1:T )0l 1 ® §?f +1 ])|] < ^[1^(1)^(1)1] 



9 



for some Cg < +00. For any fixed t, T, sup Ar>1 Epj^-^l) 2 ] 1 / 2 < 00 and sup JV>1 E[Vt+2(l) 2 ] 1 ^ 2 < 00 ( see the proof 
of Lemma A.l), thus, via Cauchy-Schwarz, we can deduce that (note that — >r denotes convergence in probability) 



The weak convergence of yNa (N) and y/N/3(N) can be obtained by the independence of the terms and [51 
Proposition 9.4.1]. By Lemma A.l the remainder y/NR(N) converges to zero in probability and we can conclude 
the result. □ 

Lemma A.l. Assume (Al). Then for fixed T > 2, t € {3, . . . , T — 2} we have that 



NR(N) = VN^ZWtf? - n«-i)](4-i[^ilVm - W,*](H-i J */(-» '))]) ^ 0. 



9=1 



Proof. To shorten the subsequent notations, define: 

C-iW = ^ 9 ,t-i[^*-i[Vt+i-W,e](^+i^/(v)](a:) 

= SUp^x^SUp^^id^-Vt+l^^+l^/CODW- 

a: cc 

It is remarked that for any bounded function ip, svp x ~Q q t—i(ip)(x) < 00 by assumption. 



We will now show that VNR(N) will go-to zero in Li. To that end, we can consider the expectation of each 
summand in the series for R(N). We have 



E 



Sg,t-1 ' 



< E 



S<7,i-1 



1/2 



where we have used Cauchy-Schwarz. For the first expectation on the R.H.S. one can condition on ^"£_x <S> ^t+i 
and apply the Marcinicwiez-Zygmund inequality (noting that sup^. \^t-i( x )\/^t-i * s upper-bounded by a finite 
deterministic constant) to obtain that 



E 



iV 



2-i 1/2 

< -^ E tff(l) 2 ]V 2 . 



Note that for each g, Epf^ (l) 2 ] 1 / 2 < 00 (e.g. [5, Corollary 5.2], or by using the upper-bound on the W^ n ). 
Now, we move onto the expression E[(^ t _ 1 ) 2 ] 1 / 2 . From the definition of .1 , we have that 

E[(£l t -i) 2 ] 1/2 = sup^^sup^ 

X X 

where Q q ,t-i(:)(x) '■— sup x Q q } t-i(')(%)/ swp x Q qj-i('){%)- Application of Jensen's inequality and Fubini leads to 
E[(^!) 2 ] 1/2 < sup^ t _ 1 (x) S u P 4 lt _ 1 (l)( a: )Q,, t _ 1 (E[|[^ 1 - W,«](^4/(-'))f]) 1/2 

X X > ' 

Then by Theorem 5.1, Corollary 5.2] (it is remarked that the corollary of that paper can be adapted to deal 



N 



when 7 t+i integrates a bounded function), it follows for N large enough relative to T — t (we will take N to infinity 

some finite const; 

E[(^-i) 2 ] 1/2 < BU P Gt_i(x)supg g)t -i(l)(s) 

c(r,t) 

□ 



and T — t is fixed) there exist some finite constant C(T,t) that depends upon T, t but not g or x t _i such that 

^KSg.t-i; J ' ^ suptri_nx; sup i^g^-ni^a;; 

Hence we have that: 



AE[|i?(A0|] < 



N 



where C(T,t) is some finite constant that may grow with T. We thus conclude as T < 00. 
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